monte carlo evaluation
Improving Monte Carlo Evaluation with Offline Data
Monte Carlo (MC) methods are the most widely used methods to estimate the performance of a policy. Given an interested policy, MC methods give estimates by repeatedly running this policy to collect samples and taking the average of the outcomes. Samples collected during this process are called online samples. To get an accurate estimate, MC methods consume massive online samples. When online samples are expensive, e.g., online recommendations and inventory management, we want to reduce the number of online samples while achieving the same estimate accuracy. To this end, we use off-policy MC methods that evaluate the interested policy by running a different policy called behavior policy. We design a tailored behavior policy such that the variance of the off-policy MC estimator is provably smaller than the ordinary MC estimator. Importantly, this tailored behavior policy can be efficiently learned from existing offline data, i,e., previously logged data, which are much cheaper than online samples. With reduced variance, our off-policy MC method requires fewer online samples to evaluate the performance of a policy compared with the ordinary MC method. Moreover, our off-policy MC estimator is always unbiased.
- North America > Canada > Alberta (0.14)
- North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- (6 more...)
On the overestimation of widely applicable Bayesian information criterion
A widely applicable Bayesian information criterion (Watanabe, 2013) is applicable for both regular and singular models in the model selection problem. This criterion tends to overestimate the log marginal likelihood. We identify an overestimating term of a widely applicable Bayesian information criterion. Adjustment of the term gives an asymptotically unbiased estimator of the leading two terms of asymptotic expansion of the log marginal likelihood. In numerical experiments on regular and singular models, the adjustment resulted in smaller bias than the original criterion.
- North America > Canada > Ontario > Toronto (0.15)
- North America > United States > New York (0.05)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Monte Carlo Methods for the Game Kingdomino
Gedda, Magnus, Lagerkvist, Mikael Z., Butler, Martin
Kingdomino is introduced as an interesting game for studying game playing: the game is multiplayer (4 independent players per game); it has a limited game depth (13 moves per player); and it has limited but not insignificant interaction among players. Several strategies based on locally greedy players, Monte Carlo Evaluation (MCE), and Monte Carlo Tree Search (MCTS) are presented with variants. We examine a variation of UCT called progressive win bias and a playout policy (Player-greedy) focused on selecting good moves for the player. A thorough evaluation is done showing how the strategies perform and how to choose parameters given specific time constraints. The evaluation shows that surprisingly MCE is stronger than MCTS for a game like Kingdomino. All experiments use a cloud-native design, with a game server in a Docker container, and agents communicating using a REST-style JSON protocol. This enables a multi-language approach to separating the game state, the strategy implementations, and the coordination layer.
- Europe > United Kingdom > Scotland (0.04)
- North America > United States (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Europe > Netherlands > Limburg > Maastricht (0.04)